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1.  Introduction 


Live-fire  testing  makes  evident  the  lethality  of  a  munition  or  the  vulnerability  of  a  target. 
From  previous  and  current  live-fire  test  programs,  targets  have  been  armored  vehicles.  For 
munitions,  live-fire  testing  exhibits  munition  lethality  by  showing  the  ability  of  a  munition  to 
destroy  armored  vehicles.  For  armored  vehicles,  live-fire  testing  exhibits  vehicle  vulnerability 
by  showing  how  susceptible  a  vehicle  is  to  damage  when  struck  by  a  munition.  Of  all 
components  in  a  vehicle,  a  certain  subset  is  considered  critical.  Tactical  functions  will  be 
degraded  when  an  encounter  between  a  munition  and  an  armored  vehicle  destroys  critical 
components  or,  at  the  very  least,  renders  them  nonfunctional  or  degraded.  In  each  firing, 
critical  components  that  were  rendered  nonfunctional  are  the  experimental  outcome  from  a 
test  shot. 

Before  testing  commences,  predictions  are  made  by  using  the  vulnerability  code,  SQuASH 
(Stochastic  Quantitative  Assessment  of  System  Hierarchies).  SQuASH  simulates  the  inter¬ 
action  between  a  munition  and  an  armored  vehicle,  and  varies  the  capabilities  of  resulting 
damage  mechanisms  primarily  penetration  and  spall  affecting  critical  components.  SQuASH 
also  conducts  sampling  to  produce  vectors  of  component  damage.  Each  element  of  the  vector, 
representing  some  critical  component,  identifies  whether  it  has  survived  or  has  been  rendered 
nonfunctional  by  the  action  of  damage  mechanisms.  Every  vector  represents  some  state  of 
damage.  So,  any  state  or  vector  identifies  a  specific  combination  of  damaged  components. 
For  each  shot,  generally  1000  trials  are  performed  to  produce  distinct  states  or  vectors  of 
component  damage  along  with  their  observed  likelihood  of  occurrence. 

From  SQuASH  predictions,  each  component-damage  vector  represents  a  hypothetical 
experimental  outcome.  Thus,  test  results  can  be  compared  with  predictions.  As  part  of  a 
consistency  examination,  test  results  are  identified  in  the  list  of  predicted  component-damage 
vectors.  For  some  shots,  no  match  may  occur  between  test  results  and  predictions.  In  such  a 
situation,  there  may  be  deficiencies  in  modeling.  On  the  other  hand,  if  modeling  is  accurate, 
an  insufficient  number  of  samples  for  a  comparison  may  have  been  selected  in  the  predictions. 
If  too  few  samples  have  been  taken,  a  question  may  be  raised  on  how  many  samples  are  needed 
to  ensure  that  observing  all  possible  component-damage  states  becomes  likely. 

When  a  munition  perforates  the  armored  shell  of  a  vehicle,  different  damage  mechanisms 
produced  by  this  interaction  may  render  various  components  nonfunctional.  Consider  the 
case  where  independence  can  be  assumed  between  every  possible  pair  of  critical  components. 
In  this  case,  probabilities  for  component  loss  are  also  assumed  to  be  precisely  known.  Then, 
the  sampling  question  surrenders  to  mathematical  investigation.  For  such  an  idealized  case, 
this  report  demonstrates  the  nature  of  the  sampling  problem.  An  illumination  of  the  problem 
is  presented.  However,  no  solutions  are  provided.  Such  solutions  are  beyond  the  scope  of 
this  report. 

A  problem  related  to  games  of  chance  was  proposed  to  Pascal  by  the  Chevalier  de  Mere. 
This  problem  became  the  seed  for  letter  correspondence  between  Pascal  and  Fermat  that 
eventually  laid  the  foundation  for  the  mathematical  theory  of  probability.*  Throughout  its 

*E.  T.  Bell,  Men  of  Mathematics.  1937,  pgs.  85-89. 
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development,  gambling  2ind  probability  theory  have  been  intertwined.  In  a  similar  vein,  this 
sampling  question  is  related  to  a  gambling  problem  that  during  the  mid-1980s  attracted  the 
attention  of  this  author. 


2.  An  Anecdotal  History 

In  1985,  this  author  shared  an  office  with  three  technicians.  At  that  time,  there  was 
more  than  usual  interest  in  the  Maryland  LOTTO  game,  since  no  one  had  won  the  lottery 
in  severaJ  weeks.  Hence,  the  jaickpot  had  become  excessively  large.  One  day  during  lunch, 
our  branch  chief  came  into  our  office  and  spoke  primarily  to  two  of  the  technicians.  This 
author’s  interest  was  diverted  from  the  then  current  efforts  to  what  our  branch  chief  was 
saying.  He  had  gathered  inning  numbers  from  the  previous  14  weekly  drawings.  Tabulation 
of  these  data,  which  he  gave  to  the  two  technicians  but  which  has  since  disappeared,  listed  the 
drawn  LOTTO  numbers  from  1  through  40,  inclusive,  with  their  corresponding  frequencies 
of  occurrence.  An  interesting  observation,  which  our  branch  chief  pointed  out,  was  that  there 
were  six  numbers  that  had  not  been  drawn.  So,  he  told  t’..e  technicians  that  they  should  play 
those  six  numbers  in  the  next  LOTTO  game:  a  typical  gamblers  folly!  This  author  does  not 
believe  that  our  branch  chief  actually  thought  those  six  numbers  would  be  drawn,  but  rather 
he  told  them  that  to  drive  both  technicians  nuts  about  a  sure  winner.  (On  the  other  hand, 
one  never  really  knows  about  branch  chiefs!)  By  the  way,  only  one  or  maybe  two  of  those 
numbers  that  had  not  been  chosen  in  the  last  14  games  were  selected  in  the  next  drawing. 

Although  not  a  gambler  and,  hence,  not  a  player  of  the  Maryland  LOTTO,  this  author  was 
still  interested  in  this  frequency  tabulation  due  to  a  personal  inclination  towards  numbers 
and  things  mathematical.  After  the  branch  chief  left  our  office,  the  technicians  and  this 
author  examined  the  frequencies.  For  some  reason,  which  has  since  been  forgotten,  one  of 
the  technicians  had  to  leave  the  office  and,  after  his  departure,  this  author  suggested  to  the 
other  technician  that  the  fairness  of  the  lottery  could  be  examined  by  applying  a  statistical 
procedure  to  these  data.  In  a  single  LOTTO  game,  6  numbered  bfills  are  drawn  from  40 
balls  without  replacement.  After  each  game,  the  selected  balls  are  replaced.  Over  many 
games,  the  likelihood  of  drawing  a  specific  numbered  ball  should  be  the  same  for  all  balls. 
Selecting  the  Chi-Square  test  was  the  obvious  choice  to  examine  whether  each  of  the  40 
values  had  the  same  probability  of  being  drawn,  as  opposed  to  some  of  them  having  different 
probabilities.  Expected  frequency  per  lottery  number  was  2.1,  because  there  was  a  sample 
of  size  84  (i.e.,  6  numbers  per  drawing  and  14  weekly  drawings)  and  40  ordinal  values.  In 
applying  the  Chi-Square  test,  a  minimal  expected  frequency  of  five  is  usually  required  to 
have  confidence  in  the  outcome  of  this  statistical  procedure.  To  obtain  this  confidence,  the 
40  ordinal  values  were  grouped  into  10  classes  of  length  4  (i.e.,  ordinal  values  1  through  4 
were  placed  in  one  class,  5  through  8  were  placed  in  the  next,  etc.).  The  frequency  of  each 
class  became  the  sum  of  frequencies  of  its  corresponding  four  members.  Now,  expected  class 
frequency  became  8.4.  The  result  from  applying  the  Chi-Square  test  supported  a  claim  of 
equal  likelihood  at  the  5%  level  of  significance.  Thus,  fairness  of  the  lottery  was  supported 
by  this  statistical  measure. 
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Aside  from  examining  whether  the  distribution  of  drawn  LOTTO  numbers  was  fair, 
ajiother  aspect  of  these  data  was  intriguing.  A  question  arose  in  this  author’s  mind  that 
approximated  the  LOTTO  situation:  “Is  it  reasonable  not  to  observe  6  distinct  values  when 
sampling  40  ordinal  values  with  replacement  84  times?,”  as  well  as  a  more  general  question: 
“What  is  the  distribution  of  distinct  values  not  observed  when  sampling  m  values  with 
replacement  N  times?”  These  questions  were  more  difficult  than  the  question  of  fairness. 
Over  a  short  period  of  time,  this  author  was  sufficiently  interested  in  these  questions  to 
investigate  the  distribution  of  number  of  unobserved  ordinal  values. 


3.  Mathematical  Structure 


In  the  more  general  case,  this  gambling  problem  is  equivalent  to  taking  samples  of  size 
N  with  replacement  from  m  lottery  balls  or  jars  or  bins.  It  has  a  multinomial  sampling 
distribution, 

N\ 


(1) 


where  both  Xi  +  12  4 - =  N  and  pi  +P2H - 1- Pm  =  1-  The  multinomial  distribution 

can  be  collapsed  into  any  of  m  binomial  distributions. 


m 


for  k  =  1,2, •  •  •  ,m.  Consider  the  m  Bernoulli  random  variables  Tk  defined  as 


0  Xfc  ^  0 


Xfc  =  0 


(2) 


(3) 


Calculation  of  means  and  variances  for  these  random  variables  becomes  easier  by  using 
the  binomial  distribution,  expression  (2),  rather  than  the  original  multinomial  distribution, 
expression  (1).  Bernoulli  random  variables,  T*,  have  means  and  v.'xiances  given  by 


^[n]  = 

<7=(nl  =  (l_pj)«|l_(l_p»)'*| 

Denote  S  as  the  sum  of  the  m  Bernoulli  random  variables,  i.e., 

m 

s=i:n, 

fc=i 

then  5  has  mean  and  variance  given  by 

m 

f^[s]  =  5^(1 -Pfc)^ 


k=l 

m 


-'“IS]  =  +  E  rcovir,,r,i. 

Jk=l  t=l  j=l 

i  ^  j 


(4) 

(5) 


(6) 

(7) 

(8) 
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Bernoulli  random  variables  Ti  and  Tj,  i  ^  j,  are  not  independent,  because  both  the  sums  of 
I’s  and  p’s  are  fixed.  Hence,  covariance  terms  do  not  necessarily  vanish. 

Difficulties  exist  in  using  the  multinomial  distribution  to  determine  expressions  for  the 
covariance  terms.  Similar  to  what  was  done  in  determining  means  and  variances  for  the 
Bernoulli  random  variables  Tfc,  expression  (1)  can  be  collapsed  into  any  of  m(m  —  l)/2 
trinomial  distributions. 


A^! 


Xi!a:j!(7V  —  Xi  —  ij)! 

Covariance  terms  can  be  expressed  as 


pfpJ'C-p.-p.)""*'-'’ 


(9) 


Co^[Ti,Ti\  =  '£  5:(/-Mri|)(m-p(r,])/>r[r,  =  /an,ir,  =m].  (10) 

1=0  m=0 

By  using  the  trinomial  distributions,  conditions  for  evaluating  the  joint  probabilities  can  be 
determined 


T.  =  0  r.  =  1 


1 

II 

H 

O 

It 

H 

o 

II 

1 

II 

H 

Xj  =  1,2, 

Xi  +  Xj  <  N 

II 

X,  =  0 

II 

o 

II 

H 

H 

II 

o 

and  upon  evaluation  the  probabilities  become 


Ti  =  0  Ti  =  l 


1  +  (1  -  Pi  -  p,)'^ 

-p(r,]  -  fi[Tj] 

fi[Ti]  -  (1  -  p,  -  pj)^ 

p[Tj]  -  (1  -  Pi  -  Pj)^ 

(1  -  Pi  -  PjT 
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Using  thr  joint  probabilities,  expression  (10)  simplifies  to 


cHr„r,|  =  (I -p.-p,)"-  pir.Mr.l .  (ii) 


Thus,  the  variance  of  5,  expression  (8),  becomes 

z  E  «  -  -  p-)"  -  i-IsUpISI  - 1)  ■  (12) 

kptl 

k^r 

Recall,  in  the  lottery  problem,  there  was  a  concern  about  whether  it  was  reasonable  that 
6  out  of  40  numbers  would  remain  unselected  after  84  draws.  As  a  test  of  the  mathemat¬ 
ical  theory,  a  simulation  was  conducted  to  mimic  the  lottery  problem.  The  computerized 
simulation  involved  sampling  equally  likely  ordinal  values  between  1  and  40,  inclusive,  as 
representatives  for  lottery  balls,  with  replacement  84  times.  After  sampling  was  completed, 
a  count  of  the  number  of  values  not  drawn  was  made.  The  combined  sampling  and  counting 
session  was  then  replicated  999  times  for  a  total  of  1000  sessions.  These  counts  were  then 
tabulated  as  frequencies  closely  resembling  the  distribution  of  number  of  unobserved  lottery 
balls.  Results  from  simulating  the  lottery  problem  are  shown  at  Table  1,  When  6  lottery 
balls  had  remained  unselected  after  14  weekly  drawings,  simulation  results  indicate  such  an 
outcome  does  appear  reasonable. 

Table  1. — Results  from  Simulating  the  Lottery  Problem 


Counts 

Frequencies 

0 

6 

1 

18 

2 

56 

3 

158 

4 

201 

5 

209 

6 

175 

7 

108 

8 

51 

9 

15 

10 

3 

5 


Using  thr  frf^jurnnrs.  values  for  central  tendency  and  variability  were  calculated.  In 
comparing  estimates,  values  from  simulation  differ  from  their  theoretical  values  in  only  the 
second  decimal  place.  Magnitude  of  differences  is  well  within  inherent  randomness  of  the 
simulation.  The  simulated  and  theoretical  values  are  shown  at  Table  2. 

Table  2.  —  Simulatrd  and  Throrrttcal  Ffttmatrf  for  the  Lottery  Problem 


Observed  Expected 

Average 

Standard  Deviation 

4.832  4.769 

1.769  1.735 

Now,  S  is  the  count  of  the  number  of  lottery  balls  or  bins  or  whatever  that  have  not 
been  observed.  Using  expressions  (7)  and  (12),  some  observations  can  be  made  concerning 
the  behavior  of  5.  When  A'  =  1,  /i[5]  =  m  —  1  and  <7*[5]  =  0,  because  something  will 
always  be  selected  with  a  sample  of  size  one,  thereby  leaving  m  —  1  things  not  observed.  So, 
with  N  =  I,  S  will  always  have  a  single-valued  distribution.  As  N  increases  without  bound, 
both  n[S]  and  <r*(5]  will  asymptotically  approach  zero,  because  as  sample  size  increases  the 
chance  that  something  will  remain  unobserved  becomes  less  likely.  So,  as  N  grows  large,  the 
distribution  of  5  will  approach  in  the  limit  a  single-valued  distribution.  Between  the  two 
extremes,  S  will  have  a  mean  that  decreases  with  increasing  sample  size.  5  will  be  nearly 
constant  at  both  small  and  large  values  for  A',  but  will  have  more  variability  at  intermediate 
values  for  N. 

One  would  be  tempted  to  use  theoretical  mean  and  variance  in  a  normal  approximation 
for  the  distribution  of  the  number  not  observed.  In  general  usage,  the  normal  would  not 
be  appropriate,  because  the  Central  Limit  Theorem  is  not  applicable.  Recall,  S  is  the  sum 
of  Bernoulli  random  variables  that  are  neither  independent  nor  identically  distributed  in 
conjunction  with  samples  taken  with  replacement  from  an  infinite  population. 

The  use  of  the  normal  for  this  specific  lottery  problem  just  happens  to  accurately  mimic 
the  distribution  of  counts.  A  measure  of  the  strength  in  using  the  normal  is  given  by  the 
critical  level  for  rejection  (i.e.,  level  at  which  one  is  forced  to  reject  a  claim  of  normality), 
and  in  this  particular  situation  it  becomes  0.36  as  measured  by  the  Chi-Square  statistical 
procedure.  This  suggests  for  selected  range  of  values  for  N  and  m  or  the  ratio  N/m  that 
the  normal  may  adequately  approximate  the  distribution.  On  the  other  hsind,  quality  of  a 
normal  approximation  in  this  case  may  be  just  a  coincidence. 
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4.  Linkage  to  Live-Fire  Sampling  Question 


Suppose  in  a  live-fire  test  shot  there  are  /  components  susceptible  to  possible  loss,  i.e., 
having  component  probabilities  of  kill,  between  0  and  1,  by  the  action  of  damage 
mechanisms  produced  by  an  encounter  between  a  munition  and  an  armored  vehicle.  Then, 
there  are  2^  component-damage  vectors.  Elach  component  vector,  under  the  assumption  of 
component  independence,  has  a  state  probability  given  by, 

p»= ripp'd (13) 

1=1 

for  t  =  1 , 2,  •  •  ■ ,  2*  and  where  O,  takes  on  a  value  of  0  or  1  depending  upon  whe'  he  tth 
component,  respectively,  survives  or  becomes  nonfunctional. 

Using  state  probabilities  in  expression  (1)  with  m  =  2‘,  the  N  samples  have  a  multinomial 
sampling  distribution.  Then,  S  is  the  number  of  distinct  component-damage  states  not 
observed.  If  W  is  denoted  as  2^  —  5,  then  W  represents  the  number  of  states  actually 
observed.  So,  W  will  have  mean  and  variance  given  by 


AW]  =  2' -,,[51  (,4) 

a^[W]  =  ,t'|S)  (15) 

in  terms  of  5,  or  more  explicitly  as 

AW]  =  (16) 

fc=i 

Aw]  =  E  i:  (1  -  p.  -  p,)"  -  ii(£/  - 1)  (17) 

k=l  r=l 

k  ^  r 

with  U  =  2^  —  /i[W^].  W  will  be  an  increasing  function  of  sample  size.  At  iV  =  1,  fi[W] 


will  be  1,  and  as  N  increases,  it  will  cisymptotically  approach  2*.  Like  5,  W  will  be  nearly 
constant  at  both  small  and  large  values  fcr  N,  but  will  have  more  variability  at  intermediate 
values  for  N. 


5.  Examples 

Four  examples  that  have  been  chosen  to  be  simple,  yet  informative  will  now  be  considered. 
They  will  be  presented  as  two  pairs,  so  the  examples  can  more  easily  be  compared,  as  well 
as  contrasted. 

The  first  pair  consists  of  two  examples  where  they  differ  only  in  the  number  of  components 
susceptible  to  loss.  In  the  2  examples,  the  first  has  5  components  while  the  second  has  10 
components.  In  both  examples,  each  component  has  a  0.5  probability  of  loss  (i.e.,  a  fair 
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coin-flip  situation).  There  are  32  states  in  the  first  example,  where  each  state  has  the 
same  probability  of  occurrence,  i.e.,  1/32.  In  the  second  example,  there  are  1024  equally 
likely  states,  where  each  has  the  same  state  probability  of  1/1024.  The  expected  number 
of  observed  states  at  selected  values  for  N  is  shown  at  Table  3.  In  both  examples,  the 
theoretical  mean  initially  exhibits  a  nearly  linear  growth  until  it  sharply  slows.  To  obtain 
a  similar  fraction  of  all  states,  more  samples  are  required  for  10  component  case  than  for 
5  component  case. 


Table  3. — First  Pair  of  Examples:  The  Effect  of  Changing  Only  the  Number  of 
Components 


Five  Components 
All  Component  Probabilities  :  0.5 
32  States 

All  State  Probabilities  :  1/32 

N _ ii[W]  p[W\/2‘ 


10 

8.705 

0.272 

20 

15.042 

0.470 

30 

19.655 

0.614 

50 

25.458 

0.796 

70 

28.533 

0.892 

90 

30.163 

0.943 

120 

31.291 

0.979 

150 

31.726 

0.991 

Ten  Components 
All  Component  Probabilities  :  0.5 
1024  States 

All  State  Probabilities  :  1/1024 

N _ fi[W]  iJi[W]/2! 


250 

221.917 

0.217 

500 

395.741 

0.386 

1000 

638.542 

0.624 

1500 

787.508 

0.769 

2000 

878.904 

0.858 

3000 

969.383 

0.947 

4000 

1003.441 

0.980 

5000 

1016.261 

0.992 

The  third  and  fourth  examples  deal  with  the  same  number  of  components  (five),  but 
differ  in  their  components’  probabilities  of  loss.  In  the  third  example,  each  component  has 
a  0.7  probability  of  loss  while  the  individual  component  probability  of  loss  in  the  fourth 
example  is  0.9.  There  are  the  same  number  of  component-damage  states  for  eaw:h  example. 
In  the  third  example,  there  are  32  different  states  having  varying  probabilities  of  occurrence: 
1  with  a  0.00243  probability,  5  with  0.00567,  10  with  0.01323,  10  with  0.03087,  5  with 
0.07203  aind  1  with  0.16807.  Similar  to  the  third  example,  the  fourth  also  has  32  different 
states  with  varying  likelihoods  of  occurrence,  but  the  list  of  probabilities  differs  from  those 
in  the  third  example:  1  with  a  0.00001  probability,  5  with  0.00009,  10  with  0.00081,  10  with 
0.00729,  5  with  0.06561,  and  1  with  0.59049.  The  expected  number  of  observed  states  at 
selected  values  for  N  is  shown  at  Table  4  for  both  examples.  Similar  to  what  was  exhibited 
for  the  first  pair,  an  initial  growth  of  the  theoretical  mean  is  also  nearly  linear  until  it 
drastically  slows.  Likewise,  in  achieving  similar  number  of  states,  the  example  having  a 
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common  component  probability  of  loss  of  0.9  requires  many  more  sampling  trials  than  for 
the  example  where  the  common  component  probability  of  loss  is  0.7. 


Table  4. — Second  Pair  of  Examples:  The  Effect  of  Changing  Only  the  Common 
Component  Probability  of  Loss 


Five  Components 
All  Component  Probabilities  :  0.7 
32  States 

Varying  State  Probabilities 


N 

/■[H'l 

10 

7.713 

0.241 

25 

14.206 

0.444 

50 

20.010 

0.625 

75 

23.249 

0.727 

100 

25.307 

0.791 

250 

29.887 

0.934 

500 

31.400 

0.981 

750 

31.768 

0.993 

Five  Components 
All  Component  Probabilities  :  0.9 
32  States 

Varying  State  Probabilities 


N 

p[W] 

ixmii 

M 

12.000 

0.375 

16.342 

0.511 

500 

19.299 

0.603 

1000 

21.987 

0.687 

2500 

25.713 

0.804 

5000 

27.687 

0.865 

10000 

29.060 

0.908 

50000 

31.338 

0.979 

In  the  four  examples,  certain  behavior  traits  were  exhibited.  In  all  examples,  almost  linear 
growth  was  exhibited  for  a  small  number  of  samples  and  that  growth  decreases  as  number 
of  samples  increases.  The  number  of  samples  necessary  to  achieve  a  similar  proportion  of 
the  total  states  appears  to  increase  in  the  first  two  examples  as  both  the  number  of  states 
increases  and  the  likelihood  of  state  probabilities  decreases.  In  contrast,  the  amount  of 
necessary  sampling  appears  to  increase  when  state  probabilities  become  both  smaller  and 
larger  in  the  last  two  examples.  Some  of  these  apparent  trends  are  correct  while  others 
are  false.  Indeed,  the  trends  are  a  part  of  the  subject  matter  for  consideration  in  the  next 
section. 


6.  Discussion 

If  the  examples  from  the  previous  section  are  arranged  by  number  of  trials  required  to 
obtain  a  large  percentage  of  all  states,  their  rankings  from  least  to  greatest  would  be  the  first 
example,  the  third  example,  the  second  example  and  the  fourth  example.  When  ordering 
the  smallest  state  probability  in  each  example  from  largest  to  smallest,  there  is  a  one-to-one 
correspondence  between  ordered  probabilities  and  example  rankings:  1/32  in  first  example. 
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243/100000  in  third,  1/1024  in  second  and  1/100000  in  fourth.  This  correspondence  is  not 
a  coincidence. 

A  reason  for  this  correspondence  can  be  extracted  from  expression  (16),  which  is  shown 
below  for  the  convenience  of  the  reader, 


k=l 

The  key  here  is  the  summands.  Each  summand  is  the  same  as  the  probability  of  observing 
at  least  one  success  in  N  trials.  When  a  state  probability  is  small,  more  sampling  trials 
are  required  before  a  successful  occurrence  becomes  likely.  The  rate  by  which  the  mean 
approaches  2!  will  be  governed  by  values  for  the  state  probabilities.  The  asymptotic  approach 
rate  can  be  drastically  slowed  when  state  probabilities  are  small.  So,  states  having  the 
smallest  probabilities  of  occurrence  will  be  the  driving  force. 

For  instance,  in  the  first  two  examples,  each  state  has  the  same  likelihood  of  occurrence 
—  in  the  5  component  case  it  is  1/32  and  in  the  10  component  case  it  is  1/1024.  When 
taking  samples,  every  state  has  the  same  chance  of  being  drawn  whether  it  has  or  has  not 
been  previously  selected.  Consider  the  situation  where  sampling  continues  until  all  states 
but  one  have  been  drawn.  In  the  five-component  case,  further  draws  become  equivalent 
to  Bernoulli  sampling  where  drawing  the  unselected  state  has  a  probability  of  1/32  and 
drawing  of  previously  selected  states  has  a  probability  of  31/32.  Similarly,  the  10  component 
case  is  equivalent  to  repeated  Bernoulli  sampling  with  a  probability  of  1/1024  for  drawing 
the  unselected  state  and  a  probability  of  1023/1024  for  drawing  previously  selected  states. 
Drawing  the  unselected  state  will  generally  be  more  likely  for  the  case  with  five  components. 
Hence,  more  effort  must  be  expended  in  the  10  component  case  than  in  the  5  component 
case  to  acquire  the  last  unselected  state. 

The  apparent  influence  that  the  number  of  involved  components  has  on  sampling  is  not 
exactly  correct  from  a  strict  perspective.  The  number  does  have  influence,  which  can  be 
enormous,  but  rather  its  influence  is  exhibited  indirectly.  From  the  multinomial  sampling 
distribution,  expression  (1),  state  probabilities  must  sum  to  unity.  For  each  additional 
component,  the  number  of  states  doubles.  Due  to  the  fixed  value  for  the  summation  of 
state  probabilities,  the  likelihood  of  occurrence  for  any  state  decreases  as  number  of  states 
increases.  Hence,  the  effort  in  sampling  a  majority  of  states  or  2ill  states  will  increase  because 
of  smaller  state  probabilities  that  is  caused  by  an  increased  number  of  components. 

In  the  last  two  examples,  which  had  the  same  number  of  components  but  different  likeli¬ 
hood  of  individual  component  loss,  the  number  of  trials  needed  to  attain  a  similar  proportion 
of  all  states  was  much  less  in  the  third  example  where  each  component  has  the  same  loss 
probability  of  0.7  thzm  in  the  fourth  example  where  the  common  component  probability  was 
0.9.  The  range  of  state  probabilities  in  the  third  example  was  within  the  range  of  values  in 
the  fourth  example.  There  was  a  state  having  a  larger  probability  in  the  fourth  example  that 
initially  appeared  to  have  influence  on  the  expected  number  of  trials.  This  appearance  of  in¬ 
fluence  is  really  false,  because  states  having  larger  probabilities  will  generally  be  drawn  more 
frequently.  As  in  the  other  examples,  the  rate  by  which  the  mean  asymptotically  approach^ 
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2'  will  be  driven  by  states  having  smaller  probabilities.  The  primary  cause  will  be  the  state 
or  states  having  the  smallest  likelihood  of  occurrence. 


7.  Conclusions 

In  making  predictions  for  live-fire  testing,  a  question  can  be  raised  on  how  many  sam¬ 
ples  must  be  taken  so  that  observing  all  component-damage  states  is  likely.  This  sampling 
question  was  investigated  to  demonstrate  the  nature  of  the  problem  involved  in  making 
predictions.  As  stated  in  the  introduction,  no  solutions  are  provided  at  this  time  for  the 
prediction  code,  SQuASH,  just  some  insights  to  the  sampling  problem  are  presented. 

By  examining  some  simple  examples,  the  nature  of  the  sampling  problem  was  demon¬ 
strated.  The  examples  represented  idealized  situations  where  both  probability  of  lews  for 
components  is  known  and  independence  exists  between  every  pair  of  components.  Although 
not  presented  in  this  report,  calculation  of  the  expected  number  of  states  observed  could 
easily  be  extended  to  situations  where  independence  is  not  present,  provided  that  the  struc¬ 
ture  of  component  dependencies  is  completely  specified.  Furthermore,  a  careful  reader  will 
note  that  knowledge  of  the  probability  of  loss  for  components  and  component  independence 
or  dependence  are  not  absolutely  necessary.  What  is  required  is  just  the  knowledge  of  the 
number  of  states  and  precise  value  of  the  probability  of  occurrence  for  each  state. 

Making  SQuASH  predictions  differs  from  idealized  situations  in  several  ways.  Neither 
component  probabilities  of  loss  nor  component-damage  state  probabilities  will  be  known. 
Such  probabilities  are  estimated  from  sampling.  Total  number  of  states  will  be  known 
only  when  components  are  independent.  With  dependencies  between  components,  the  total 
number  of  states  having  non-zero  probabilities  will  be  unknown.  Predictions  involving  states 
having  low  probabilities,  whether  components  are  independent  or  not,  will  generally  require 
much  more,  if  not  excessive,  sampling  so  that  all  states  can  be  selected. 

In  the  predictive  code,  state  probabilities,  along  with  the  number  of  damage  states  with 
non- zero  probability  of  occurrence,  are  not  known  before  any  samples  are  taken.  Elstimates 
for  such  quantities  can  be  obtained  only  after  sampling  has  been  completed.  The  estimated 
quantities  are  precise  only  if  a  sufficient  number  of  samples  have  been  selected.  A  desir¬ 
able  upgrade  to  the  predictive  code  would  be  some  method  of  accurately  measuring  these 
quantities  as  sampling  is  taking  place. 

Aside  from  quantities  not  being  completely  specified,  there  is  another  issue:  If  all  hypo¬ 
thetical  outcomes  are  not  generated,  there  is  a  risk  of  erroneously  deducing  that  for  some 
shot  SQuASH  is  invalid  or  has  serious  deficiencies  when  in  reality  an  experimental  outcome 
of  low  probability  was  actually  observed.  To  avoid  making  such  a  spurious  judgment,  all 
possible  damage  states  must  be  generated.  However,  the  effort  involved  in  attempting  to 
generate  all  possible  states  may  not  be  worth  the  ensuing  cost.  This  risk  versus  effort  could 
be  another  area  for  future  research. 
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